Word Extraction Based on Semantic Constraints in Chinese Word-Formation
نویسندگان
چکیده
This paper presents a novel approach to Chinese word extraction based on semantic information of characters. A thesaurus of Chinese characters is conducted. A Chinese lexicon with 63,738 two-character words, together with the thesaurus of characters, are explored to learn semantic constraints between characters in Chinese word-formation, forming a semantic-tag-based HMM. The Baum-Welch re-estimation scheme is then chosen to train parameters of the HMM in the way of unsupervised learning. Various statistical measures for estimating the likelihood of a character string being a word are further tested. Large-scale experiments show that the results are promising: the F-score of this word extraction method can reach 68.5% whereas its counterpart, the character-based mutual information method, can only reach 47.5%.
منابع مشابه
Chinese Entity Relation Extraction Based on Word Co-occurrence
Chinese entity relation extraction is a part of entity relation extraction. According to entity relation extraction technology and the features of Chinese news corpus, this paper proposes a novel method for Chinese entities relation extraction. The method, named WCORE (word co-occurrence relation extraction), first measures the semantic similarity by word co-occurrence and then adopts pattern m...
متن کاملThe Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)
The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...
متن کاملThe evolution of the meaning of the word nurse based on the classical texts of Persian literature
Background and Aim: The semantic evolution of a word over time is inevitable, indicating a social, political, religious or cultural process. Nurse is one of the words that has a significant presence in Persian literature texts and has been used in many different meanings such as slave, servan, maid, devotee, obedient, patient and preserver. The purpose of this study is to show its semantic ev...
متن کاملKeyword Extraction From Chinese Text Based On Multidimensional Weighted Features
This paper proposed to solve the problems of incomplete coverage and low accuracy in keyword extraction of Chinese text based on intrinsic feature of the Chinese language and an extraction method of multidimensional information weighted eigenvalues. This method combined theoretical analysis and experimental calculation to study the parts of speech, word position, word length, semantic similarit...
متن کاملWord clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering
The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005